2 Exploratory data analysis

During our week course of EDA i worked with big amount of data. We were provided with a database of the river Rhine. In this presentation I will show the main data manipulations and some final results of the changes

2.1 Summary of values

## Warning in melt.data.table(runoff_stats, id.vars = "sname", by = sname):
## 'measure.vars' [mean_day, sd_day, min_day, max_day, median] are not all of the
## same type. By order of hierarchy, the molten data value column will be of type
## 'list'. All measure variables not of type 'list' will be coerced too. Check
## DETAILS in ?melt.data.table for more on coercion.
summary statistics
sname variable value
REES mean_day 2251
DUES mean_day 2126
KOEL mean_day 2086
ANDE mean_day 2039
KAUB mean_day 1654
MAIN mean_day 1612
SPEY mean_day 1276
WORM mean_day 1415
MAXA mean_day 1253
RHEI mean_day 1031
LOBI mean_day 2218
BASR mean_day 1044
RHEM mean_day 1031
REKI mean_day 441
NEUF mean_day 369
DOMA mean_day 117
DIER mean_day 230
REES sd_day 1112
DUES sd_day 1078
KOEL sd_day 1039
ANDE sd_day 1057
KAUB sd_day 745
MAIN sd_day 707
SPEY sd_day 518
WORM sd_day 599
MAXA sd_day 529
RHEI sd_day 436
LOBI sd_day 1134
BASR sd_day 457
RHEM sd_day 435
REKI sd_day 193
NEUF sd_day 168
DOMA sd_day 97
DIER sd_day 167
REES min_day 500
DUES min_day 464
KOEL min_day 401
ANDE min_day 560
KAUB min_day 482
MAIN min_day 460
SPEY min_day 364
WORM min_day 415
MAXA min_day 340
RHEI min_day 259
LOBI min_day 575
BASR min_day 272
RHEM min_day 315
REKI min_day 120
NEUF min_day 104
DOMA min_day 11
DIER min_day 40
REES max_day 11700
DUES max_day 11000
KOEL max_day 10900
ANDE max_day 10400
KAUB max_day 7160
MAIN max_day 6920
SPEY max_day 4410
WORM max_day 5400
MAXA max_day 4340
RHEI max_day 4219
LOBI max_day 13000
BASR max_day 5530
RHEM max_day 4220
REKI max_day 1872
NEUF max_day 1167
DOMA max_day 1563
DIER max_day 2028
REES median c(1920, 1888.56, 1850, 1790, 1500, 1471.93, 1180, 1300, 1145.456, 954, 1950, 979, 968, 969, 955, 401.6785, 330, 83.202, 78.8, 172.488)
DUES median c(1920, 1888.56, 1850, 1790, 1500, 1471.93, 1180, 1300, 1145.456, 954, 1950, 979, 968, 969, 955, 401.6785, 330, 83.202, 78.8, 172.488)
KOEL median c(1920, 1888.56, 1850, 1790, 1500, 1471.93, 1180, 1300, 1145.456, 954, 1950, 979, 968, 969, 955, 401.6785, 330, 83.202, 78.8, 172.488)
ANDE median c(1920, 1888.56, 1850, 1790, 1500, 1471.93, 1180, 1300, 1145.456, 954, 1950, 979, 968, 969, 955, 401.6785, 330, 83.202, 78.8, 172.488)
KAUB median c(1920, 1888.56, 1850, 1790, 1500, 1471.93, 1180, 1300, 1145.456, 954, 1950, 979, 968, 969, 955, 401.6785, 330, 83.202, 78.8, 172.488)
MAIN median c(1920, 1888.56, 1850, 1790, 1500, 1471.93, 1180, 1300, 1145.456, 954, 1950, 979, 968, 969, 955, 401.6785, 330, 83.202, 78.8, 172.488)
SPEY median c(1920, 1888.56, 1850, 1790, 1500, 1471.93, 1180, 1300, 1145.456, 954, 1950, 979, 968, 969, 955, 401.6785, 330, 83.202, 78.8, 172.488)
WORM median c(1920, 1888.56, 1850, 1790, 1500, 1471.93, 1180, 1300, 1145.456, 954, 1950, 979, 968, 969, 955, 401.6785, 330, 83.202, 78.8, 172.488)
MAXA median c(1920, 1888.56, 1850, 1790, 1500, 1471.93, 1180, 1300, 1145.456, 954, 1950, 979, 968, 969, 955, 401.6785, 330, 83.202, 78.8, 172.488)
RHEI median c(1920, 1888.56, 1850, 1790, 1500, 1471.93, 1180, 1300, 1145.456, 954, 1950, 979, 968, 969, 955, 401.6785, 330, 83.202, 78.8, 172.488)
LOBI median c(1920, 1888.56, 1850, 1790, 1500, 1471.93, 1180, 1300, 1145.456, 954, 1950, 979, 968, 969, 955, 401.6785, 330, 83.202, 78.8, 172.488)
BASR median c(1920, 1888.56, 1850, 1790, 1500, 1471.93, 1180, 1300, 1145.456, 954, 1950, 979, 968, 969, 955, 401.6785, 330, 83.202, 78.8, 172.488)
RHEM median c(1920, 1888.56, 1850, 1790, 1500, 1471.93, 1180, 1300, 1145.456, 954, 1950, 979, 968, 969, 955, 401.6785, 330, 83.202, 78.8, 172.488)
REKI median c(1920, 1888.56, 1850, 1790, 1500, 1471.93, 1180, 1300, 1145.456, 954, 1950, 979, 968, 969, 955, 401.6785, 330, 83.202, 78.8, 172.488)
NEUF median c(1920, 1888.56, 1850, 1790, 1500, 1471.93, 1180, 1300, 1145.456, 954, 1950, 979, 968, 969, 955, 401.6785, 330, 83.202, 78.8, 172.488)
DOMA median c(1920, 1888.56, 1850, 1790, 1500, 1471.93, 1180, 1300, 1145.456, 954, 1950, 979, 968, 969, 955, 401.6785, 330, 83.202, 78.8, 172.488)
DIER median c(1920, 1888.56, 1850, 1790, 1500, 1471.93, 1180, 1300, 1145.456, 954, 1950, 979, 968, 969, 955, 401.6785, 330, 83.202, 78.8, 172.488)

2.2 Stations dates pre and after 2000

With this plots easy to show and work with max and min run off for the whole period of time

2.3 Analysis of specifc stations for each month

This plot shows changes in total runoff during month at different points of altitude along the river, DOMA and BASR are at higher altitudes, while KOEL is at a lower altitude, in the results it is easy to show the greater divergence of the mean at stations with higher altitude,(smaller in summer, greater in winter), this holds true for all stations. in this ploblem we took 3 stations DOMA BASR and KOEL

2.4 Average run off at specific stations

The same 3 stations that in previous plot. After 2000 We can see little changes between pre and after 2000